Introduction

How does music relate to the lyrics? It is tempting to think that a song tries to convey some feeling or emotion, and that both the music and lyrics are there to support this message. Let me give you an example. We might expect a song with a slow beat and laid back guitar to talk about laid back topics, maybe a trip to the beach. At the other end of the spectrum, heavy metal would likely concern itself with darker, heavier subjects. However, are these suspicions even true? Let’s put some numbers to the hypothesis that there in fact is a relationship between music and lyrics. In the next sections I’ll take you through a journey where we approach this topic with a statistical mindset, harnessing all the powers that modern technology has to offer along the way.

We’ll start out with picking a large body of music and for each track in there, we are going to collect and store the lyrics. It would be way too cumbersome to scrape all the lyrics from the internet myself, but fortunately the Musixmatch API allows querying lyrics from code in a single API call. For an unpaid account only 30% of the lyrics for a queried track is returned, but that will do for our intents and purposes. I assign every set of lyrics a sentimental, or valency, score automatically using the NLTK package, which offers natural language processing functionalities. A low score indicates a sad feeling, whereas a high score a happy feeling. When the lyrics have a numerical score we can start to answer our question: how does music relate to lyrics?

The research question is still a bit broad. We settled on how to analyze lyrics, but not yet on which aspects on music we’ll focus. We are going to keep the research broad, and explore how the lyrics relate to the four main elements of music, that is melody, harmony, instrumentation and rhythm. To access and preprocess the music properties the Spotify API is used. For each element we will either confirm or refute any hypotheses that intuitively make a lot of sense, but are not (yet) backed up by data.

Let us dive into it!

Corpus

The first order of business is choosing the corpus of music. We have chosen a broad research question and the corpus should reflect this. It must draw inspiration from various genres and contain a large number of songs, only then can we justify general conclusions. Because the heavy lifting in terms of fetching the data we need is done by the Musixmatch and Spotify APIs, this is most certainly possible.

The exhaustive list of albums that are included in this research:

Elephant, Madvillainy, ..Like Clockwork, Street Worms, Midnights, HEROES & VILLAINS, St. Elsewhere, The White Album, Plastic Beach, Demon Days, Thriller, In the Aeroplane Over the Sea, Hawaii: Part ||, WHEN WE ALL FALL ASLEEP, WHERE DO WE GO?, Dua Lipa, The Money Store, OFFLINE!, OK Computer and Rumours

This totals over 280 tracks and 16 hours of listening time.


Playlist

Discovery


The Spotify API offers functionalities that range from very high to very low level. Here we will use some the the high level analyses like valence and energy to learn about the corpus.

When we plot energy, musical and lyrical valence values against each other we find something enormously interesting. Clearly, even though energy and valence do not seem related, musical and lyrical valence appear highly correlated.

Melody


Intuitively, it would appear that melody encodes a lot of the valency information of a song. The melody is usually the most memorable part and often indicative of the feel of a song. So it makes sense to look at the melody of two tracks, one with low and one with high lyrical valence. A visualization tool that makes sense to use, is a chromogram. This captures for each moment the notes that are played, as analyzed using the fourier transform. Let’s try this and see if any melody lines become apparent.

Unfortunately, looking at the chromograms, no discernible melody is recognizable. The only thing that sticks out is the droning ‘E’ in Ball and Biscuit, but this could hardly be called a melody. It appears we need a different tool.

Melody: What is the happiest key?


Though finding specific melodies is a difficult task to automize, we could look at the key in which the melody is played.

Harmony


The troubles with key matching.

Instrumentation

Rhythm


Hypothesis: higher tempo songs tend to be more aggressive and slower songs more sensual.

Let’s put this one to the test. For this hypothesis we’ll denote songs that have a lower BPM than the median (< 115.9BPM) as slow songs, and the rest songs as fast songs (≥ 115.9BPM).

So far we’ve explored only the lyrical valency property, but not the lyrics themselves. We might gain some new insights if we look at the lyrics directly, so let’s try it. One of the most useful tools for visualizing patterns in textual data is a so called word cloud, which you can see to the side. The words in blue refer to words that occur very frequently in fast songs relative to slow songs, and vice versa for the red words.

Immediately we can see instances that prove the hypothesis. Slow song words include sensual words such as number (as in, someones phone number), kiss, boy and hot. These are words we would expect to encounter in a love song. Though what stands out is that love is included in the fast songs. There are also some odd ones out like bones. As for the fast tracks we also find what one would expect, e.g. aggressive words like kill, gun and ill. Also, very noticeably, we find numerous verbs and filler words. This makes sense in a track where the singer (or rapper) has to keep up the pace in a high BPM track, and it’s easiest for the listener and artist to reuse many of the common verbs and filler words to keep the information stream somewhat limited.

Most of the data in this plot seems to confirm the hypothesis (though there are exceptions, like love among the fast tracks).

Conclusion